Picture for Ranjie Duan

Ranjie Duan

MESA: Improving MoE Safety Alignment via Decentralized Expertise

Add code
May 30, 2026
Viaarxiv icon

AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security

Add code
May 28, 2026
Viaarxiv icon

Adversarial Orthogonal Disentanglement for LVLM Hallucination Mitigation

Add code
May 25, 2026
Viaarxiv icon

Knowledge-Guided Adversarial Training for Infrared Object Detection via Thermal Radiation Modeling

Add code
Mar 26, 2026
Viaarxiv icon

Improving Safety Alignment via Balanced Direct Preference Optimization

Add code
Mar 24, 2026
Viaarxiv icon

Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search

Add code
Feb 26, 2026
Viaarxiv icon

Pruning as a Cooperative Game: Surrogate-Assisted Layer Contribution Estimation for Large Language Models

Add code
Feb 08, 2026
Viaarxiv icon

YuFeng-XGuard: A Reasoning-Centric, Interpretable, and Flexible Guardrail Model for Large Language Models

Add code
Jan 22, 2026
Viaarxiv icon

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

Add code
Jan 16, 2026
Viaarxiv icon

Towards Class-wise Fair Adversarial Training via Anti-Bias Soft Label Distillation

Add code
Jun 10, 2025
Viaarxiv icon